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Multiplication of n-digit integers by long multiplication requires 0(n 2 ) operations and can be time- 
consuming. In 1970 A. Schonhage and V. Strassen published an algorithm capable of performing the 
task with only 0(n\og{n)) arithmetic operations over C; naturally, finite-precision approximations 
to C are used and rounding errors need to be accounted for. Overall, using variable-precision fixed- 
point numbers, this results in an (9(«(log(«)) 2+£ )-time algorithm. However, to make this algorithm 
more efficient and practical we need to make use of hardware-based floating-point numbers. How do 
we deal with rounding errors? and how do we determine the limits of the fixed-precision hardware? 
Our solution is to use interval arithmetic to guarantee the correctness of results and determine the 
hardware's limits. We examine the feasibility of this approach and are able to report that 75,000-digit 
base-256 integers can be handled using double-precision containment sets. This clearly demonstrates 
that our approach has practical potential; however, at this stage, our implementation does not yet 
compete with commercial ones, but we are able to demonstrate the feasibility of this technique. 

1 Introduction 

Multiplication of very large integers is a crucial subroutine of many algorithms such as the RSA cryp- 
tosystem [7 ]. Consequently, much effort has gone into finding fast and reliable multiplication algorithms; 
101 discusses several methods. The asymptotically-fastest known algorithm [1] requires nlog(n)2°( los *( n )) 
steps, where log* is the iterated logarithm — defined as the number of times one has to repeatedly take 
the logarithm before the number is less than 1. However, being asymptotically-faster does not translate to 
being faster in practice. We shall concern ourselves with the practicalities of the subject; we will analyse 
our algorithm's performance on a finite range of numbers. 

The algorithm we are studying here is based on the first of two asymptotically fast multiplication 
algorithms by A. Schonhage and V. Strassen [8]. These algorithms are based on the convolution theorem 
and the fast Fourier transform. The first algorithm (the one we are studying) performs the discrete Fourier 
transform over C using finite-precision approximations. The second algorithm uses the same ideas as the 
first, but it works over the finite ring Z 2 2» +1 rather than the uncountable field C. We wish to point out that 
"the Schonhage-Strassen algorithm" usually refers to the second algorithm. However, in this document 
we use it to refer to the first C-based algorithm. 

From the theoretical viewpoint, the second algorithm is much nicer than the first. The second al- 
gorithm does not require the use of finite-precision approximations to C. Also, the second algorithm 
requires 0(nlog(n)log(log(n))) steps to multiply two «-bit numbers, making it asymptotically-faster 
than the first algorithm. However, the second algorithm is much more complicated than the first, and it 
is outperformed by asymptotically-slower algorithms, such as long multiplication, for small-to-medium 
input sizes. In practice, both of the Schonhage-Strassen algorithms are rarely used. 
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The first Schonhage-Strassen Algorithm is more elegant, if the finite-precision approximations are 
ignored. More importantly, it is faster in practice. Previous studies (H have shown that the first algorithm 
can be faster than even highly-optimised implementations of the second. However, the first algorithm's 
reliance on finite-precision approximations, despite exact answers being required, leads to it being dis- 
counted. 

The saving grace of the Schonhage-Strassen algorithm is that at the end of the computation an integral 
result will be obtained. So the finite-precision approximations are rounded to integers. Thus, as long as 
rounding errors are sufficiently small for the rounding to be correct, an exact answer will be obtained. 
Shonhage and Strassen showed that fixed-point numbers with a variable precision of 0(log(n)) bits 
would be sufficient to achieve this. 

For the Schonhage-Strassen algorithm to be practical, we need to make use of hardware-based 
floating-point numbers; software-based variable-precision numbers are simply too slow. However, we 
need to be able to manage the rounding errors. At the very least, we must be able to detect when the 
error is too large and more precision is needed. The usual approach to this is to prove some kind of 
worst-case error bound (for an example, see [6 ]). Then we can be sure that, for sufficiently small inputs, 
the algorithm will give correct results. However, worst-case bounds are rarely tight. We propose the use 
of dynamic error bounds using existing techniques from computer-aided proofs. 

Dynamic error detection allows us to move beyond worst-case bounds. For example, using standard 
single-precision floating-point numbers, our naive implementation of the Schonhage-Strassen algorithm 
sometimes gave an incorrect result when we tried multiplying two 120-digit base-256 numbers, but it 
usually gave correct results. Note that by a 'naive implementation' we simply mean a modificiation of the 
Schonhage-Strassen algorithm that uses fixed-precison floating-point arithmetic and does not guarantee 
correctness. A worst-case bound would not allow us to use the algorithm in this case, despite it usually 
being correct. Dynamic error detection, however, would allow us to try the algorithm, and, in the rare 
instances where errors occur, it would inform us that we need to use more precision. 

We will use complex interval containment sets for all complex arithmetic operations. This means 
that at the end of the computation, where we would ordinarily round to the nearest integer, we simply 
choose the unique integer in the containment set. If the containment set contains multiple integers, then 
we report an error. This rigorous extension of the Schonhage-Strassen algorithm therefore constitutes a 
computer-aided proof of the desired product. When an error is detected, we must increase the precision 
being used or we must use a different algorithm. 

For those unfamiliar with the Schonhage-Strassen algorithm or with interval arithmetic, we describe 
these in section [2] Then, in section |3l we show the empirical results of our study. Section [4] our 
conclusion, briefly discusses the implications of our results. 

2 The Algorithm 

For the sake of completeness we explain the Schonhage-Strassen algorithm, as it is presented in (H. 
We also explain how we have modified the algorithm using interval arithmetic in subsection 12.51 Those 
already familiar with the material may skip all or part of this section. 

We make the convention that a positive integer x is represented in base b (usually b = 2 k for some 
k G N) as a vector* G := {0, 1,2, . .. ,b — 1}"; the value of ;cis 

n-l 

x = Y, x i b '- 

i=0 
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2.1 Basic Multiplication Algorithm 

The above definition immediately leads to a formula for multiplication. Let x and y be positive integers 
with representations x £ Z" b and y £ Z™. Then 

(ra-l \ /m-l \ n+m-2 min{n— n+m-l 
i'=0 / \j'=0 / (=0 y=max{0,;-m+l} i=0 

Of course, we cannot simply set z,- = L^maxfo Z-m+i}^'-^ tms ma y v i°l ate me constraint that < z, < 
b — 1 for every i. We must 'carry' the 'overflow'. This leads to the long multiplication algorithm (see 
0). 

The Long Multiplication Algorithm 



1 . Input : iGZj and y £ 

2 . Output : z £ Z£ +m # z = xv 

3. Set c = 0. # c = carry 

4. For i = up to n + m— 1 do { 

5. Set 5 = 0. # s = sum 

6. For 7 =max{0, i — m + l} up to min{n— 1,?} do { 

7 . Set 5 = s + Xjyj-j . 

8. }. 

9. Set Zi = (s + c) mod b. 

10. Set c= [{s + c)/b\ . 

11. }. 

12. # c = at the end. 



This algorithm requires 0(mn) steps (for a fixed ft). Close inspection of the long multiplication algorithm 
might suggest that 0(m«log(min{m,«})) steps are required as the sum s can become very large. How- 
ever, adding a bounded number (x ; y,_i < b 2 ) to an unbounded number (s) is, on average, a constant-time 
operation. 

2.2 The Discrete Fourier Transform 

The basis of the Schonhage-Strassen algorithm is the discrete Fourier transform and the convolution 
theorem. The discrete Fourier transform is a map from C" to C". In this section we will define the 
discrete Fourier transform and we will show how it and its inverse can be calculated with 0(n\og(n)) 
complex additions and multiplications. See 14] for further details. 

Definition 1 (Discrete Fourier Transform). Letx £ C" and let (0 :=e^ . Then define the discrete Fourier 
transform x £ C" of x by 

n-l 

x r = J^Xj(O ij (0< i<n-l). 

7=0 

There is nothing special about our choice of ft); the consequences of the following lemma are all that 
we need co to satisfy. Any other element of C with the same properties would suffice. 

2jri 

Lemma 2. Let n > 1 and CO = e « . Then 



ft)" = 1 and C0 k / 1 for all <k <n 
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and, for allO <k <n, 

n-l 

£ fi>* = 0. 

i=0 

Note that the case where n = 1 is uninteresting, as ft) = 1 and the discrete Fourier transform is the 
identity mapping x = x. 

Now we can prove that the discrete Fourier transform is a bijection. 
Proposition 3 (Inverse Discrete Fourier Transform). Let x G <C n and let co = e~ . Define ieC" by 

Y n—l 

^ ■= - £ ^e)~ y (0 < ? < n- 1). 

n 7=0 

7W« defines the inverse of the discrete Fourier transform — that is, ify = x, then y = x. 

Now we explain the fast Fourier transform; this is simply a fast algorithm for computing the discrete 
Fourier transform and its inverse. 

Let n be a power of 2 and x G C" be given. Now define x even ,x dd G C"/ 2 by 

(Xeven ) ; = x 2i i (Xodd ) ; = x 2i+ 1 j 

for all iwith0< i<n/2-l. 

Now the critical observation of the Cooley-Tukey fast Fourier transform algorithm is the following. 
Fix i with < i < n— 1 and let 00 = e~ . Then we have 

n-l 

Xi = Y. x i ( ° ii 

7=0 

n/2- 1 .. n/2-1 

= £ (Xeven^fV^ + G/ £ ^odd); (« 2 ) " 
7=0 7=0 

= (^even)j mod „/ 2 + ®' (*odd); mod „/2 • 

Note that (co 2 )'^ 2 = 1, so taking the modulus is justified. This observation leads to the following divide- 
and-conquer algorithm. 

The Cooley-Tukey Fast Fourier Transform 



1 . Input : n = 2 k and x G C" 

2 . Output : x G C" 

3. function FFT(&, x) { 

4. If k = 0, then x = x. 

5. Partition x into x even ,x odd G C"/ 2 . 

6. Compute x even = FFT(&— 1, x even ) by recursion. 

7. Compute x odd = FFT(&— 1, x odd ) by recursion. 

8 . Compute CO = e » . 

9 . For / = up to n—l do { 

10. Set Xi = (Xeven); mo d n/2 + (*odd); mo d n/2" 

11. }. 

12. }. 
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It is easy to show that this algorithm requires 0(nlog(n)) complex additions and multiplications. With 
very little modification we are also able to obtain a fast algorithm for computing the inverse discrete 
Fourier transform. 

Note that, to compute ft), we can use the recurrence 

£01 = 1, (02 = -I, C0 4 =i, Ohn = 17X771 ( W ^ 3 )' 

I J- + (On | 

where (o n = e^r . Other efficient methods of computing ft) are also available. 



2.3 The Convolution Theorem 

We start by defining the convolution. Let a,b G C". We can interpret a and b as the coefficients of two 
polynomials — that is, 

fa(z) =a + aiz-\ han-iz" -1 . 

The convolution of a and b — denoted by a * b — is, for our purposes, the vector of coefficients obtained 
by multiplying the polynomials f a and fy. Thus we have f a *b(z) = f a {z)fb{z) for all z G C. Note that 
we can add 'padding zeroes' to the end of the coefficient vectors without changing the corresponding 
polynomial. 

The convolution theorem relates convolutions to Fourier transforms. We only use a restricted form. 

Theorem 4 (Convolution Theorem). Let a, b G C" and c :=a*b G C m , where m = 2n—l. Pad a and b 
by setting 

a = (a ,ai,-- - ,a„_i,0,--- ,0),b' = (b Q ,b u --- ,b n -i,0,--- ,0) G C m . 
Then, for every i with < i < m — 1, 

di = a'ib'i. 

The convolution theorem gives us a fast method of computing convolutions and, thus, of multiplying 
polynomials. Given a,b G C, we can calculate c = a*b using only 0(«log(«)) arithmetic operations as 
follows. 



The Fast Convolution Algorithm 



1. 


Input : 


a,beC n 


2. 


Output : 


c = a*beC m 


3. 


Set k = 


[log 2 (2n-l)l and m = 2 k . 


4. 


# Pad a 


and b so they are in C" . 


5. 


Set a' = 


(a ,ai,--- ,a„_i,0,--- ,0),# = (&o,&i 


6. 


Compute 


a' = FFT(fc,a') and P = FFT(A:,&') • 


7. 


For 0< 


i <m—l , set c; = a'/Z?',- . 


8. 


Compute 


c = FFT" 1 (A:,c). 



2.4 The Schonhage-Strassen Algorithm 

The Schonhage-Strassen algorithm multiplies two integers by convolving them and then performing 
carrys. Let two base-b integer representations be x and y. We consider the digits as the coefficients of 
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two polynomials. Then x = f x {b), y = f y (b) and 

xy = fx(b)f y (b) =f x *y{b)- 

So, to compute xy, we can first compute x*y in 0{n\og{n)) steps and then we can evaluate f x * y {b). The 
evaluation of f x * y {b) to yield an integer representation z is simply the process of performing carry s. 



The Schonhage-Strassen Algorithm 

1 . Input : x G Z£ and y G Z£ 

2 . Output : zeZf 1 # z = xy 

3. Compute x*y using the fast convolution Algorithm. 

4. Set c = 0. #carry 

5. For i = up to In — 1 do { 

6. Set z,- = ((x*y) i + c) mod 

7. Set c= L((**.y) ; - + c)/6j . 

8. }. 

9. Set Z2n-\ =c. 

Clearly the Schonhage-Strassen algorithm performs the multiplication using 0{n\og{n)) complex arith- 
metic operations. 

When finite-precision complex arithmetic is done, rounding errors are introduced. However, this can 
be countered: We know that x* y must be a vector of integers. As long as the rounding errors introduced 
are sufficiently small, we can round to the nearest integer and obtain the correct result. Schonhage and 
Strassen (H proved that 0(log(&«))-bit floating point numbers give sufficient precision. 

2.5 Interval Arithmetic 

Our rigorous extension of the algorithm uses containment sets. By replacing all complex numbers with 
complex containment sets, we can modify the Schonhage-Strassen algorithm to find a containment set 
of x* y; if the containment set only contains one integer- valued vector, then we can be certain that this 
is the correct value. We have used rectangular containment sets of machine-representable floating-point 
intervals with directed rounding to guarantee the desired integer product. A brief overview of the needed 
interval analysis 0] is given next. 

Let x,x be real numbers with x<x. Let [x,x] = {x G M : x < x < x} be a closed and bounded real 
interval and let the set of all such intervals be M = {[x,x] : x < x;x,x G M.}. Note that Id since 
we allow thin or punctual intervals with x = x. If ★ is one of the arithmetic operators +,—,-,/, we 
define arithmetic over operands in M by [a, a] *\b,b] := {a-kb : a G [a,a],b G [^,^]}, with the exception 
that [a, a] /[£,&] is undefined if G [b,b]. Due to continuity and monotonicity of the operations and 
compactness of the operands, arithmetic over M is given by real arithmetic operations with the bounds: 

[a, a] + \b, b] = [a + b, a + b] 
[a, a] — \b, b] = [a — b,a — b\ 

[a,a] ■ [b,b] = [mm{ab,ab,ab,ab},max{ab,ab,ab,ab}] 

[a,a]/[b,b] = \a,a] ■ [1/b, l/b], if g \b,b] . 
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In addition to the above elementary operations over elements in WL, our algorithm requires us to contain 
the range of the square root function over elements in ERfl [0,°°). Once again, due to the monotonicity 
of the square root function over non-negative reals it suffices to work with the real image of the bounds 
\/[x,x] = [y/x, V*], if < x. To complete the requirements for our rigorous extension of the Schonhage- 
Strassen algorithm we need to extend addition, multiplication and division by a non-zero integer to 
elements in 

IC := {\z,z] := |zi,Zr]+%,S] : fei,2T], \ziM G «} • 

Interval arithmetic over M naturally extends to IC, the set of rectangular complex intervals. Addition 
and subtraction over \z,z], \w,w] G IC given by 

\Z,Z] ± \W,W] = ([Z]_,ZT] ± +i([Z2,Z2] ± [W2,WJ]) 

are sharp but not multiplication or division due to rectangular wrapping effects. Complex interval multi- 
plication and division of a complex interval by a non-negative integer can be contained with real interval 
multiplications given by 

\z,z] ■ [w,W] = (\zi,Zl] ■ - \Z2,Z2] ■ [W2,W2]) +i([Z]_,Zl] • [W2,W^\ + [Z2,Z2[ • 

See (3l for details about how C-XSC manipulates rectangular containment sets over IK and IC. 



3 Results 

We have implemented the Schonhage-Strassen algorithm, our containment-set version with rectangu- 
lar complex intervals and long multiplication in C++ using the C-XSC library Our implementation 
is available at nttp : //www . math . canterbury . ac . nz/~r . sainudiin/codes/capa/multiply/| El 
Results show that, using base 256, our version of the algorithm is usually able to guarantee correct 
answers for up to 75,000-digit numbers. 

The following graph compares the speed of long multiplication (labelled 'Long multiplication'), the 
conventional Schonhage-Strassen algorithm with different underlying data types (the implementation 
using the C-XSC complex data type is the line labelled 'complex naive SS' and the one using our own 
implementation of complex numbers based on the C++ double data type is labelled 'double naive SS') 
and our containment-set version ('cinterval extended SS') on uniformly-random rc-digit base-256 inputs. 
All tests were performed on a 2.2 GHz 64-bit AMD Athlon 3500+ Processor running Ubuntu 9.04 using 
C-XSC version 2.2.4andgcc version 4.3.1. Times were recorded using the C++ clock() function — that 
is to say, CPU time was recorded. Note that only the 'Long multiplication' and 'cinterval extended SS' 
implementations are guaranteed to produce correct results. The 'double naive SS' and 'complex naive 
SS' implementations may have produced erroneous results, as the implementations do not necessarily 
provide sufficient precision; these are still shown for comparison. Note also that by 'naive' we mean 
that these implementations use fixed-precision floating-point arithmetic, whereas the 'real' Schonhage- 
Strassen algorithm uses variable-precision, which is much slower. 



Please also download the C-XSC library from http : //www. math. uni-wuppertal . de/~xsc/ 
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The above graph shows that the Schonhage-Strassen algorithm is much more efficient than long mul- 
tiplication for large inputs. However, our modified version of the algorithm is slower than the naive 
Schonhage-Strassen algorithm. We believe that C-XSC is not well-optimised; for example, their punctual 
complex data type (used in the 'complex naive SS' implementation) is much slower than our double- 
based complex data type (used in the 'double naive SS' implementation), even though ostensibly they 
are the same thing. We see that the C-XSC cinterval type (used in the 'cinterval extended SS' im- 
plementation) is about three times as slow as the complex type. This leaves the possibility that a more 
optimised implementation of containment sets would be able to compete with commercial algorithms. 
Investigations such as have shown that the Naive Schonhage-Strassen algorithm is able to compete 
with commercial implementations. 

Note that the "steps" seen in the graph can be explained by the fact that the algorithm will always 
round the size up to the nearest power of two. Thus there are steps at the powers of two. The most 
important feature of our results is the range of input sizes for which our algorithm successfully determines 
the answer. Using only standard double-precision IEEE floating-point numbers, we are able to use the 
algorithm to multiply 75,000-digit, or 600,000-bit, integers; this range is more than sufficient for most 
applications, and at this point the second Schonhage-Strassen algorithm will become competitive. 



4 Conclusion 

Our investigation has demonstrated that the Schonhage-Strassen algorithm with containment sets is a 
practical algorithm that could be used reliably for applications such as cryptography. Schohage and 
Strassen never showed that their algorithm had any practical value. However, as our implementation was 
not optimised, this is more of a feasibility study than a finished product. 

Note that the advantage of our algorithm over the original Schonhage-Strassen algorithm is that we 
make use of hardware-based floating-point arithmetic, whereas the original is designed to make use of 
much slower software-based arithmetic. Both the original algorithm and our adaptation always produce 
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correct results. However, we use a different approach to guaranteeing correctness. The naive algorithms 
we mention are not guaranteed to be correct because they are modifications of the Schonhage-Strassen 
algorithm which does not take measures to ensure correctness — they simply use fixed-precision floating- 
point arithemtic and hope for the best; these are only useful for speed comparisons. 

It remains to optimise our implementation of the algorithm to compete with commercial libraries. 
This is dependent on a faster implementation of interval arithmetic. It may also be interesting to use 
circular containment sets rather than rectangular containment sets. The advantage of circular containment 
sets is that they easily deal with complex rotations — that is, multiplying by e ld ; this is in fact the only 
type of complex multiplication (other than division by an integer) that our algorithm performs. 

References 

[1] Martin Fiirer (2007): Faster Integer Multiplication. In: 39th ACM STOC, San Diego, California, USA, pp. 
57-66. 

[2] Pierrick Gaudry, Alexander Kruppa & Paul Zimmermann (2007): A gmp-based implementation of schdnhage- 
strassen's large integer multiplication algorithm. In: ISSAC '07: Proceedings of the 2007 international sym- 
posium on Symbolic and algebraic computation, ACM, New York, NY, USA, pp. 167-174. 

[3] Hofschuster & Kramer (2004): C-XSC 2.0: A C++ library for extended scientific computing. In: R Alt, 
A Frommer, RB Kearfott & W Luther, editors: Numerical software with result verification, Lecture notes in 
computer science 2991, Springer- Verlag, pp. 15-35. 

[4] Donald E. Knuth (1998): The Art of Computer Programming, 2. Addison-Wesley, 3 edition. 

[5] Ramon E. Moore, R. Baker Kearfott & Michael J. Cloud (2009): Introduction to Interval Analysis. Society 
for Industrial and Applied Mathematics, Philadelphia, PA, USA. 

[6] Colin Percival (2003): Rapid multiplication modulo the sum and difference of highly composite numbers. 
Math. Comput. 72(241), pp. 387-395. 

[7] R.L. Rivest, A. Shamir & L. Adleman (1977): A Method for Obtaining Digital Signatures and Public-Key 
Cryptosy stems. Communications of the ACM 21(2), pp. 120-126. 

[8] A. Schonhage & V. Strassen (1971): Schnelle Multiplikation grofier Zahlen (Fast Multiplication of Large 
Numbers). Computing: Archiv fur elektronisches Rechnen (Archives for electronic computing) 7, pp. 281— 
292. (German). 



